For this assignment, you will be adding code to the python script file neuralnetworksA4.py
that you will download from here. The file neuralnetworksA4_initial.py
currently contains the implementation of the NeuralNetwork
class that is a solution to A3. It also contains an incomplete implementation of the subclass NeuralNetworkClassifier
that extends NeuralNetwork
as discussed in class. Copy or rename this file to neuralnetworksA4.py
and complete the implementation of NeuralNetworkClassifier
. Your NeuralNetworkClassifier
implementation should rely on inheriting functions from NeuralNetwork
as much as possible. Your neuralnetworksA4.py
file (notice it is plural) will now contain two classes, NeuralNetwork
and NeuralNetworkClassifier
. The tar file neuralnetworksA4.tar
also contains optimizers.py
, the version of our optimizer code that you must use in this assignment.
In NeuralNetworkClassifier
you will replace the _error_f
function with one called _neg_log_likelihood_f
. You will also have to define a new version of the _gradient_f
function for NeuralNetworkClassifier
.
Here are some example tests.
%load_ext autoreload
%autoreload 2
import numpy as np
import matplotlib.pyplot as plt
Import your completed neuralnetworksA4.py
code that defines NeuralNetwork
and NeuralNetworkClassifier
classes.
import neuralnetworksA4 as nn
X = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
T = np.array([[0], [1], [1], [0]])
X, T
(array([[0, 0], [1, 0], [0, 1], [1, 1]]), array([[0], [1], [1], [0]]))
np.random.seed(111)
nnet = nn.NeuralNetworkClassifier(2, [10], 2)
print(nnet)
NeuralNetworkClassifier(2, [10], 2) has not been trained.
nnet.Ws
[array([[ 0.12952296, -0.38212533, -0.07383268, 0.31091752, -0.23633798, -0.40511172, -0.55139454, -0.09211682, -0.30174387, -0.18745848], [ 0.56662595, -0.3028474 , -0.48359706, 0.19583749, 0.13999926, -0.26066957, -0.03900416, -0.44067096, -0.49195143, 0.46277416], [ 0.33943873, 0.39325596, 0.36397022, 0.56690583, 0.08922813, 0.36230683, -0.09085429, -0.5456561 , -0.05295844, -0.45573018]]), array([[0., 0.], [0., 0.], [0., 0.], [0., 0.], [0., 0.], [0., 0.], [0., 0.], [0., 0.], [0., 0.], [0., 0.], [0., 0.]])]
The _error_f
function is replaced with _neg_log_likelihood
. If you add some print statements in _neg_log_likelihood
functions, you can compare your output to the following results.
nnet.set_debug(True)
Debugging information will now be printed.
nnet.train(X, T, X, T, n_epochs=1, method='sgd', learning_rate=0.01)
In _neg_log_likelihood_f: arguments are X (standardized): [[-1. -1.] [ 1. -1.] [-1. 1.] [ 1. 1.]] T (indicator variables): [[1. 0.] [0. 1.] [0. 1.] [1. 0.]] Result of call to self._forward is: [array([[-1., -1.], [ 1., -1.], [-1., 1.], [ 1., 1.]]), array([[-0.65071726, -0.44024437, 0.04576217, -0.42339865, -0.43460927, -0.46740829, -0.39822372, 0.71346703, 0.23848393, -0.19208626], [ 0.34231299, -0.79254131, -0.72655903, -0.06007838, -0.18346578, -0.77314041, -0.46175878, 0.0128676 , -0.62959015, 0.62370478], [-0.09735492, 0.30405172, 0.64909581, 0.5928089 , -0.27947188, 0.2144819 , -0.53935438, -0.19458859, 0.13639376, -0.80263068], [ 0.77613968, -0.28371417, -0.19108161, 0.79083647, -0.00711046, -0.294489 , -0.59233336, -0.79262132, -0.68931726, -0.1784822 ]]), array([[0., 0.], [0., 0.], [0., 0.], [0., 0.]])] Result of _softmax is: [[0.5 0.5] [0.5 0.5] [0.5 0.5] [0.5 0.5]] Result of np.log(Y + sys.float_info.epsilon) is: [[-0.69314718 -0.69314718] [-0.69314718 -0.69314718] [-0.69314718 -0.69314718] [-0.69314718 -0.69314718]] _neg_log_likelihood_f returns: 0.3465735902799724 in _backpropagate: first delta calculated is [[-0.0625 0.0625] [ 0.0625 -0.0625] [ 0.0625 -0.0625] [-0.0625 0.0625]] in _backpropagate: next delta is [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] in _backpropagate: next delta is [[0. 0.] [0. 0.] [0. 0.] [0. 0.]] In _neg_log_likelihood_f: arguments are X (standardized): [[-1. -1.] [ 1. -1.] [-1. 1.] [ 1. 1.]] T (indicator variables): [[1. 0.] [0. 1.] [0. 1.] [1. 0.]] Result of call to self._forward is: [array([[-1., -1.], [ 1., -1.], [-1., 1.], [ 1., 1.]]), array([[-0.65071726, -0.44024437, 0.04576217, -0.42339865, -0.43460927, -0.46740829, -0.39822372, 0.71346703, 0.23848393, -0.19208626], [ 0.34231299, -0.79254131, -0.72655903, -0.06007838, -0.18346578, -0.77314041, -0.46175878, 0.0128676 , -0.62959015, 0.62370478], [-0.09735492, 0.30405172, 0.64909581, 0.5928089 , -0.27947188, 0.2144819 , -0.53935438, -0.19458859, 0.13639376, -0.80263068], [ 0.77613968, -0.28371417, -0.19108161, 0.79083647, -0.00711046, -0.294489 , -0.59233336, -0.79262132, -0.68931726, -0.1784822 ]]), array([[ 2.81243949e-04, -2.81243949e-04], [ 1.30260852e-04, -1.30260852e-04], [-7.34786885e-05, 7.34786885e-05], [-1.04105784e-04, 1.04105784e-04]])] Result of _softmax is: [[0.50014062 0.49985938] [0.50006513 0.49993487] [0.49996326 0.50003674] [0.49994795 0.50005205]] Result of np.log(Y + sys.float_info.epsilon) is: [[-0.69286598 -0.69342846] [-0.69301693 -0.69327745] [-0.69322066 -0.6930737 ] [-0.69325129 -0.69304308]] _neg_log_likelihood_f returns: 0.34655855279873804 In _neg_log_likelihood_f: arguments are X (standardized): [[-1. -1.] [ 1. -1.] [-1. 1.] [ 1. 1.]] T (indicator variables): [[1. 0.] [0. 1.] [0. 1.] [1. 0.]] Result of call to self._forward is: [array([[-1., -1.], [ 1., -1.], [-1., 1.], [ 1., 1.]]), array([[-0.65071726, -0.44024437, 0.04576217, -0.42339865, -0.43460927, -0.46740829, -0.39822372, 0.71346703, 0.23848393, -0.19208626], [ 0.34231299, -0.79254131, -0.72655903, -0.06007838, -0.18346578, -0.77314041, -0.46175878, 0.0128676 , -0.62959015, 0.62370478], [-0.09735492, 0.30405172, 0.64909581, 0.5928089 , -0.27947188, 0.2144819 , -0.53935438, -0.19458859, 0.13639376, -0.80263068], [ 0.77613968, -0.28371417, -0.19108161, 0.79083647, -0.00711046, -0.294489 , -0.59233336, -0.79262132, -0.68931726, -0.1784822 ]]), array([[ 2.81243949e-04, -2.81243949e-04], [ 1.30260852e-04, -1.30260852e-04], [-7.34786885e-05, 7.34786885e-05], [-1.04105784e-04, 1.04105784e-04]])] Result of _softmax is: [[0.50014062 0.49985938] [0.50006513 0.49993487] [0.49996326 0.50003674] [0.49994795 0.50005205]] Result of np.log(Y + sys.float_info.epsilon) is: [[-0.69286598 -0.69342846] [-0.69301693 -0.69327745] [-0.69322066 -0.6930737 ] [-0.69325129 -0.69304308]] _neg_log_likelihood_f returns: 0.34655855279873804 SGD: Epoch 1 Likelihood = Train 0.70712 Validate 0.70712
NeuralNetworkClassifier(2, [10], 2)
print(nnet)
NeuralNetworkClassifier(2, [10], 2) trained for 1 epochs with final likelihoods of 0.7071 train 0.7071 validation. Network weights set to best weights from epoch 0 for validation likelihood of 0.7071174143714485.
Now if you turn off debugging, most print statements will be suppressed so you can run for more epochs without tons of output.
nnet.set_debug(False)
No debugging information will be printed.
The use()
function returns two numpy
arrays. The first one is the class predictions for each sample, containing values from the set of unique values in T
passed into the train()
function.
The second value are the probabilities of each class for each sample. This should contain a column for each unique value in T
.
nnet.use(X)
(array([[0], [0], [1], [1]]), array([[0.50014062, 0.49985938], [0.50006513, 0.49993487], [0.49996326, 0.50003674], [0.49994795, 0.50005205]]))
def percent_correct(Y, T):
return np.mean(T == Y) * 100
percent_correct(nnet.use(X)[0], T)
50.0
The XOR problem was used early in the history of neural networks as a problem that cannot be solved with a linear model. Let's try it.
nnet = nn.NeuralNetworkClassifier(2, [], 2) # [], so no hidden layers, just a linear model
nnet.train(X, T, X, T, 100, method='sgd', learning_rate=0.1)
SGD: Epoch 10 Likelihood = Train 0.70711 Validate 0.70711 SGD: Epoch 20 Likelihood = Train 0.70711 Validate 0.70711 SGD: Epoch 30 Likelihood = Train 0.70711 Validate 0.70711 SGD: Epoch 40 Likelihood = Train 0.70711 Validate 0.70711 SGD: Epoch 50 Likelihood = Train 0.70711 Validate 0.70711 SGD: Epoch 60 Likelihood = Train 0.70711 Validate 0.70711 SGD: Epoch 70 Likelihood = Train 0.70711 Validate 0.70711 SGD: Epoch 80 Likelihood = Train 0.70711 Validate 0.70711 SGD: Epoch 90 Likelihood = Train 0.70711 Validate 0.70711 SGD: Epoch 100 Likelihood = Train 0.70711 Validate 0.70711
NeuralNetworkClassifier(2, [], 2)
print(nnet)
NeuralNetworkClassifier(2, [], 2) trained for 100 epochs with final likelihoods of 0.7071 train 0.7071 validation. Network weights set to best weights from epoch 0 for validation likelihood of 0.7071067811865477.
nnet.use(X)
(array([[0], [0], [0], [0]]), array([[0.5, 0.5], [0.5, 0.5], [0.5, 0.5], [0.5, 0.5]]))
percent_correct(nnet.use(X)[0], T)
50.0
Now try with one hidden layer containing one unit.
nnet = nn.NeuralNetworkClassifier(2, [1], 2)
nnet.train(X, T, X, T, 100, method='adamw', learning_rate=0.1)
AdamW: Epoch 10 Likelihood = Train 0.75498 Validate 0.75498 AdamW: Epoch 20 Likelihood = Train 0.78091 Validate 0.78091 AdamW: Epoch 30 Likelihood = Train 0.78535 Validate 0.78535 AdamW: Epoch 40 Likelihood = Train 0.78640 Validate 0.78640 AdamW: Epoch 50 Likelihood = Train 0.78677 Validate 0.78677 AdamW: Epoch 60 Likelihood = Train 0.78695 Validate 0.78695 AdamW: Epoch 70 Likelihood = Train 0.78705 Validate 0.78705 AdamW: Epoch 80 Likelihood = Train 0.78712 Validate 0.78712 AdamW: Epoch 90 Likelihood = Train 0.78718 Validate 0.78718 AdamW: Epoch 100 Likelihood = Train 0.78723 Validate 0.78723
NeuralNetworkClassifier(2, [1], 2)
Y, probs = nnet.use(X)
print(Y)
percent_correct(Y, T)
[[0] [1] [0] [0]]
75.0
One hidden unit didn't work. Let's try five hidden units.
nnet = nn.NeuralNetworkClassifier(2, [5], 2)
nnet.train(X, T, X, T, 400, method='adamw')
AdamW: Epoch 40 Likelihood = Train 0.99988 Validate 0.99988 AdamW: Epoch 80 Likelihood = Train 0.99992 Validate 0.99992 AdamW: Epoch 120 Likelihood = Train 0.99993 Validate 0.99993 AdamW: Epoch 160 Likelihood = Train 0.99994 Validate 0.99994 AdamW: Epoch 200 Likelihood = Train 0.99995 Validate 0.99995 AdamW: Epoch 240 Likelihood = Train 0.99995 Validate 0.99995 AdamW: Epoch 280 Likelihood = Train 0.99996 Validate 0.99996 AdamW: Epoch 320 Likelihood = Train 0.99996 Validate 0.99996 AdamW: Epoch 360 Likelihood = Train 0.99996 Validate 0.99996 AdamW: Epoch 400 Likelihood = Train 0.99997 Validate 0.99997
NeuralNetworkClassifier(2, [5], 2)
print(nnet)
NeuralNetworkClassifier(2, [5], 2) trained for 400 epochs with final likelihoods of 1.0000 train 1.0000 validation. Network weights set to best weights from epoch 399 for validation likelihood of 0.9999679983795212.
Y, probs = nnet.use(X)
print(Y)
percent_correct(Y, T)
[[0] [1] [1] [0]]
100.0
A second way to evaluate a classifier is to calculate a confusion matrix. This shows the percent accuracy for each class, and also shows which classes are predicted in error.
Here is a function you can use to show a confusion matrix.
import pandas
def confusion_matrix(Y_classes, T):
class_names = np.unique(T)
table = []
for true_class in class_names:
row = []
for Y_class in class_names:
row.append(100 * np.mean(Y_classes[T == true_class] == Y_class))
table.append(row)
conf_matrix = pandas.DataFrame(table, index=class_names, columns=class_names)
print('Percent Correct')
return conf_matrix.style.background_gradient(cmap='Blues').format("{:.1f}")
nnet.best_epoch
399
nnet.use(X)
(array([[0], [1], [1], [0]]), array([[9.99948915e-01, 5.10846472e-05], [2.59792751e-05, 9.99974021e-01], [1.10633564e-04, 9.99889366e-01], [9.99931691e-01, 6.83094785e-05]]))
confusion_matrix(nnet.use(X)[0], T)
Percent Correct
0 | 1 | |
---|---|---|
0 | 100.0 | 0.0 |
1 | 0.0 | 100.0 |
for method in ('sgd', 'adamw', 'scg'):
nnet = nn.NeuralNetworkClassifier(2, [20, 20], 2)
nnet.train(X, T, X, T, 400, method=method, learning_rate=0.1, momentum=0.9, verbose=False)
pc = percent_correct(nnet.use(X)[0], T)
print(f'{method} % Correct: {pc:.0f}')
sgd % Correct: 100 adamw % Correct: 100 scg % Correct: 100
NeuralNetworkClassifier
to Handwritten Digits¶Apply your NeuralNetworkClassifier
to the MNIST digits dataset.
First, make sure your solution works on the following examples. Then complete make_mnist_classifier
and use it as instructed below.
import pickle
import gzip
with gzip.open('mnist.pkl.gz', 'rb') as f:
train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
Xtrain = train_set[0]
Ttrain = train_set[1].reshape(-1, 1)
Xval = valid_set[0]
Tval = valid_set[1].reshape(-1, 1)
Xtest = test_set[0]
Ttest = test_set[1].reshape(-1, 1)
print(Xtrain.shape, Ttrain.shape, Xval.shape, Tval.shape, Xtest.shape, Ttest.shape)
(50000, 784) (50000, 1) (10000, 784) (10000, 1) (10000, 784) (10000, 1)
28*28
784
def draw_digit(image, label, predicted_label=None):
plt.imshow(-image.reshape(28, 28), cmap='gray')
plt.xticks([])
plt.yticks([])
plt.axis('off')
title = str(label)
color = 'black'
if predicted_label is not None:
title += ' as {}'.format(predicted_label)
if predicted_label != label:
color = 'red'
plt.title(title, color=color)
plt.figure(figsize=(7, 7))
for i in range(100):
plt.subplot(10, 10, i+1)
draw_digit(Xtrain[i], Ttrain[i, 0])
plt.tight_layout()
nnet = nn.NeuralNetworkClassifier(784, [12], 10)
# nnet = nn.NeuralNetworkClassifier(784, [100, 50, 20, 50], 10)
nnet.train(Xtrain, Ttrain, Xval, Tval, n_epochs=100, batch_size=-1, method='scg') # , learning_rate=0.1)
print(nnet)
SCG: Epoch 10 Likelihood= Train 0.94325 Validate 0.94576 SCG: Epoch 20 Likelihood= Train 0.96238 Validate 0.96252 SCG: Epoch 30 Likelihood= Train 0.97022 Validate 0.96881 SCG: Epoch 40 Likelihood= Train 0.97452 Validate 0.97166 SCG: Epoch 50 Likelihood= Train 0.97693 Validate 0.97244 SCG: Epoch 60 Likelihood= Train 0.97899 Validate 0.97315 SCG: Epoch 70 Likelihood= Train 0.98061 Validate 0.97320 SCG: Epoch 80 Likelihood= Train 0.98177 Validate 0.97281 SCG: Epoch 90 Likelihood= Train 0.98235 Validate 0.97299 SCG: Epoch 100 Likelihood= Train 0.98321 Validate 0.97263 NeuralNetworkClassifier(784, [12], 10) trained for 100 epochs with final likelihoods of 0.9832 train 0.9726 validation. Network weights set to best weights from epoch 64 for validation likelihood of 0.973500774228357.
def first_100_tests(nnet, Xtest, Ttest):
plt.figure(figsize=(7, 7))
Ytest, _ = nnet.use(Xtest[:100, :])
for i in range(100):
plt.subplot(10, 10, i + 1)
draw_digit(Xtest[i], Ttest[i, 0], Ytest[i, 0])
plt.tight_layout()
first_100_tests(nnet, Xtest, Ttest)
Experiment with the three different optimization methods, at least three hidden layer structures including []
, two learning rates, and two numbers of epochs. Use verbose=False
as an argument to train()
. For scg
, ignore the learning rate loop. Print a single line for each run showing method, number of epochs, learning rate, hidden layer structure, and percent correct for training, validation, and testing data. Here is an example line:
sgd 10 0.1 [] 77.16 79.22 79.05
Use a pandas.DataFrame
to show your results with columns labeled correctly.
# ...
Complete the following function.
def make_mnist_classifier(Xtrain, Ttrain, Xvalidate, Tvalidate, Xtest, Ttest,
n_hiddens_each_layer, n_epochs, batch_size=-1,
method='adamw', learning_rate=0.1, momentum=0.9):
from IPython.display import display # to display the confusion matrix in the last step of this function
# Create NeuralNetworkClassifier object
# ...
# Train it.
# ...
# Plot the performance trace with legend (f'{method} Train Data', f'{method} Validation Data')
# Also plot a vertical line at the best epoch, using code like plt.axvline(nnet.best_epoch, lw=3, alpha=0.5)
#...
# Show the results on the first 100 test images.
# ...
plt.show()
# Print the network
print(nnet)
# Print percent correct on training data, validation data and test data.
# ...
# Print a confusion matrix using the trained neural network applied to the testing data.
# display( ... )
Here is an example of what your function should produce.
hiddens = [5]
n_epochs = 40
batch_size = -1
method = 'adamw'
learning_rate = 0.1
make_mnist_classifier(Xtrain, Ttrain, Xval, Tval, Xtest, Ttest, hiddens, n_epochs, batch_size, method, learning_rate)
AdamW: Epoch 4 Likelihood = Train 0.86147 Validate 0.86344 AdamW: Epoch 8 Likelihood = Train 0.89506 Validate 0.89917 AdamW: Epoch 12 Likelihood = Train 0.90809 Validate 0.91218 AdamW: Epoch 16 Likelihood = Train 0.91348 Validate 0.91781 AdamW: Epoch 20 Likelihood = Train 0.91704 Validate 0.92071 AdamW: Epoch 24 Likelihood = Train 0.91949 Validate 0.92272 AdamW: Epoch 28 Likelihood = Train 0.92167 Validate 0.92427 AdamW: Epoch 32 Likelihood = Train 0.92380 Validate 0.92533 AdamW: Epoch 36 Likelihood = Train 0.92530 Validate 0.92575 AdamW: Epoch 40 Likelihood = Train 0.92657 Validate 0.92669
NeuralNetworkClassifier(784, [5], 10) trained for 40 epochs with final likelihoods of 0.9266 train 0.9267 validation. Network weights set to best weights from epoch 39 for validation likelihood of 0.9266944118121945. Training 73.590 % correct Validation 73.620 % correct Testing 72.500 % correct Percent Correct
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 93.0 | 0.1 | 1.0 | 1.2 | 0.3 | 1.6 | 1.8 | 0.1 | 0.8 | 0.0 |
1 | 1.2 | 92.1 | 0.6 | 1.1 | 0.4 | 0.0 | 0.3 | 2.5 | 1.9 | 0.0 |
2 | 4.7 | 1.4 | 68.0 | 15.5 | 1.1 | 1.6 | 1.8 | 0.6 | 5.4 | 0.0 |
3 | 1.5 | 0.4 | 4.8 | 78.5 | 0.6 | 0.8 | 1.1 | 2.6 | 9.8 | 0.0 |
4 | 0.1 | 0.1 | 0.4 | 0.6 | 91.1 | 1.6 | 2.1 | 2.3 | 0.6 | 0.9 |
5 | 5.8 | 0.4 | 4.0 | 3.1 | 10.1 | 57.5 | 4.5 | 1.6 | 12.6 | 0.3 |
6 | 3.8 | 0.2 | 2.5 | 0.5 | 0.9 | 1.1 | 90.8 | 0.0 | 0.1 | 0.0 |
7 | 0.5 | 1.2 | 0.3 | 8.8 | 1.9 | 0.8 | 0.1 | 83.3 | 2.4 | 0.8 |
8 | 1.7 | 0.2 | 7.6 | 9.7 | 1.6 | 9.7 | 0.4 | 3.4 | 65.7 | 0.0 |
9 | 1.0 | 0.0 | 0.3 | 1.3 | 7.2 | 3.7 | 0.3 | 83.0 | 0.8 | 2.5 |
Use your function to show results with the three different optimization methods using values for the hidden layer structure, learning rate, and numbers of epochs that work well, such as over 90% correct on test data.
# ...
Discuss your results. In your discussion, include observations about
write your comments here
Tar or zip your jupyter notebook (A4solution.ipynb
) and your python script file (neuralnetworksA4.py
) into a file named A4.tar
or A4.zip
. Check in the tar or zip file in Canvas.
Download A4grader.zip, extract A4grader.py
before running the following cell.
Remember, you are expected to design and run your own tests in addition to the tests provided in A4grader.py
.
%run -i A4grader.py
======================= Code Execution ======================= ============================ import neuralnetworksA4 as nn ============================ neuralnetworksA4.py defines NeuralNetwork and NeuralNetworkClassifier ================================================================================ Testing this for 10 points: # Checking that NeuralNetworkClassifier is subcless of NeuralNetwork # and test result with issubclass(nn.NeuralNetworkClassifier, nn.NeuralNetwork) ---------------------------------------------------------------------- ---- 10/10 points. Correct class inheritance. ---------------------------------------------------------------------- ================================================================================ Testing this for 5 points: # Checking if the _forward function in NeuralNetworkClassifier is inherited from NeuralNetwork import inspect forward_func = [f for f in inspect.classify_class_attrs(nn.NeuralNetworkClassifier) if (f.name == 'forward' or f.name == '_forward')] # and test result with forward_func[0].defining_class == nn.NeuralNetwork ---------------------------------------------------------------------- ---- 5/5 points. NeuralNetworkClassifier _forward function correctly inherited from NeuralNetwork. ---------------------------------------------------------------------- ================================================================================ Testing this for 5 points: # Checking if __str__ is overridden in NeuralNetworkClassifier import inspect str_func = [f for f in inspect.classify_class_attrs(nn.NeuralNetworkClassifier) if (f.name == '__str__')] # and test result with str_func[0].defining_class == nn.NeuralNetworkClassifier ---------------------------------------------------------------------- ---- 5/5 points. NeuralNetworkClassifier __str__ function correctly overridden in NeuralNetworkClassifier. ---------------------------------------------------------------------- ================================================================================ Testing this for 5 points: # Checking if _gradient_f in NeuralNetworkClassifier is defined (overridden) in NeuralNetworkClassifier import inspect str_func = [f for f in inspect.classify_class_attrs(nn.NeuralNetworkClassifier) if (f.name == '_gradient_f')] # and test result with str_func[0].defining_class == nn.NeuralNetworkClassifier ---------------------------------------------------------------------- ---- 5/5 points. NeuralNetworkClassifier _gradient_f function correctly defined in NeuralNetworkClassifier. ---------------------------------------------------------------------- ================================================================================ Testing this for 5 points: # Checking if _backpropagate in NeuralNetworkClassifier is inherited from NeuralNetwork import inspect str_func = [f for f in inspect.classify_class_attrs(nn.NeuralNetworkClassifier) if (f.name == '_backpropagate')] # and test result with str_func[0].defining_class == nn.NeuralNetwork ---------------------------------------------------------------------- ---- 5/5 points. NeuralNetworkClassifier _backpropagate function correctly inherited from NeuralNetwork. ---------------------------------------------------------------------- ================================================================================ Testing this for 10 points: nnet = nn.NeuralNetworkClassifier(2, [], 5) W_shapes = [W.shape for W in nnet.Ws] correct = [(3, 5)] # and test result with correct == W_shapes ---------------------------------------------------------------------- ---- 10/10 points. W_shapes is correct value of [(3, 5)]. ---------------------------------------------------------------------- ================================================================================ Testing this for 10 points: nnet = nn.NeuralNetworkClassifier(2, [], 5) G_shapes = [G.shape for G in nnet.Grads] correct = [(3, 5)] # and test result with correct == G_shapes ---------------------------------------------------------------------- ---- 10/10 points. G_shapes is correct value of [(3, 5)] ---------------------------------------------------------------------- ================================================================================ Testing this for 10 points: np.random.seed(42) X = np.random.uniform(0, 1, size=(100, 2)) T = (np.abs(X[:, 0:1] - 0.5) > 0.3).astype(int) nnet = nn.NeuralNetworkClassifier(2, [10, 5], len(np.unique(T))) nnet.train(X, T, X, T, 20, method='scg') last_error = nnet.get_performance_trace()[-1] correct = 0.9297448356260026 SCG: Epoch 2 Likelihood= Train 0.70967 Validate 0.70967 SCG: Epoch 4 Likelihood= Train 0.71795 Validate 0.71795 SCG: Epoch 6 Likelihood= Train 0.73195 Validate 0.73195 SCG: Epoch 8 Likelihood= Train 0.79354 Validate 0.79354 SCG: Epoch 10 Likelihood= Train 0.84909 Validate 0.84909 SCG: Epoch 12 Likelihood= Train 0.88401 Validate 0.88401 SCG: Epoch 14 Likelihood= Train 0.90492 Validate 0.90492 SCG: Epoch 16 Likelihood= Train 0.94989 Validate 0.94989 SCG: Epoch 18 Likelihood= Train 0.96324 Validate 0.96324 SCG: Epoch 20 Likelihood= Train 0.97509 Validate 0.97509 # and test result with np.allclose(last_error, correct, atol=0.1) ---------------------------------------------------------------------- ---- 10/10 points. Correct values in performance_trace. ---------------------------------------------------------------------- ================================================================================ Testing this for 10 points: np.random.seed(43) X = np.random.uniform(0, 1, size=(100, 2)) T = (np.abs(X[:, 0:1] - X[:, 1:2]) < 0.5).astype(int) T[T == 0] = 10 T[T == 1] = 20 # Unique class labels are now 10 and 20! nnet = nn.NeuralNetworkClassifier(2, [10, 5], 2) nnet.train(X, T, X, T, 200, method='scg') classes, prob = nnet.use(X) correct_classes = np.array([[20], [20], [10], [20], [10], [20], [20], [10], [20], [10], [20], [10], [20], [10], [20], [10], [20], [20], [20], [20]]) SCG: Epoch 20 Likelihood= Train 0.96174 Validate 0.96174 SCG: Epoch 40 Likelihood= Train 0.98515 Validate 0.98515 SCG: Epoch 60 Likelihood= Train 0.99438 Validate 0.99438 SCG: Epoch 80 Likelihood= Train 0.99730 Validate 0.99730 SCG: Epoch 100 Likelihood= Train 0.99730 Validate 0.99730 SCG: Epoch 120 Likelihood= Train 0.99786 Validate 0.99786 SCG: Epoch 140 Likelihood= Train 0.99851 Validate 0.99851 SCG: Epoch 160 Likelihood= Train 0.99860 Validate 0.99860 SCG: Epoch 180 Likelihood= Train 0.99893 Validate 0.99893 SCG: Epoch 200 Likelihood= Train 0.99936 Validate 0.99936 # and test result with np.allclose(classes, correct_classes, atol=0.1) ---------------------------------------------------------------------- ---- 10/10 points. Correct values in classes. ---------------------------------------------------------------------- ================================================================================ Testing this for 10 points: correct_prob = np.array([[7.87686254e-10, 9.99999999e-01], [2.64073742e-10, 1.00000000e+00], [1.00000000e+00, 2.17739214e-11], [2.37507101e-10, 1.00000000e+00], [1.00000000e+00, 5.72602779e-13], [2.63951189e-10, 1.00000000e+00], [3.07141256e-10, 1.00000000e+00], [9.99999995e-01, 5.31601303e-09], [5.18960837e-10, 9.99999999e-01], [1.00000000e+00, 5.29910868e-15], [2.31535786e-10, 1.00000000e+00], [1.00000000e+00, 4.76259538e-17], [2.31088925e-10, 1.00000000e+00], [1.00000000e+00, 3.30767340e-16], [3.09810289e-10, 1.00000000e+00], [9.99999999e-01, 7.34252931e-10], [2.31089312e-10, 1.00000000e+00], [2.32724737e-10, 1.00000000e+00], [2.69944404e-10, 1.00000000e+00], [2.59802216e-10, 1.00000000e+00]]) # and test result with np.allclose(probs, correct_probs, atol=0.1) ---------------------------------------------------------------------- ---- 10/10 points. Correct values in probs. ---------------------------------------------------------------------- ====================================================================== A4 Execution Grade is 80 / 80 ====================================================================== -- / 5 points. Experiment with the three different optimization methods, at least three hidden layer structures including [], two learning rates, and two numbers of epochs. Use verbose=False as an argument to train(). For scg, ignore the learning rate loop. Print a single line for each run showing method, number of epochs, learning rate, hidden layer structure, and percent correct for training, validation, and testing data. __ / 5 points. Function make_mnist_classifier defined and used correctly. __ / 5 points. Discuss your results. In your discussion, include observations about which method achieves the best result, which method seems to do best with fewer epochs, what common classification mistakes are made as shown in your confusion matrices, and do larger networks (more layers, more units) work better than small networks? __ / 5 points. Train a network with values for method, learning rate, number of epochs, and a hidden layer structure with no more than 100 units in the first layer that you found work well. Extract the weight matrix from the first layer. Now, for each unit (column in the weight matrix) ignore the first row of bias weights and reshape the remaining weights into a 28 x 28 image for each unit and display them. Complete the function to draw the weight matrix for one unit using draw_digit as a guide, then use it in a loop to draw the weight matrices for each unit in the first layer of your network. Discuss what you see. Describe some of the images as patterns that could be useful for classifying particular digits. ====================================================================== A4 Results and Discussion Grade is ___ / 20 ====================================================================== ====================================================================== A4 FINAL GRADE is _ / 100 ====================================================================== Extra Credit (2 points possible): Extra Credit for 1 point: Repeat the above experiments with a different classification data set. Randonly partition your data into training, validaton and test parts if not already provided. Write in markdown cells descriptions of the data and your results. of the data and your results. Extra Credit for 1 point: Train a network with values for method, learning rate, number of epochs, and a hidden layer structure with no more than 100 units in the first layer that you found work well. Extract the weight matrix from the first layer. Now, for each unit (column in the weight matrix) ignore the first row of bias weights and reshape the remaining weights into a 28 x 28 image for each unit and display them. Complete the following function to draw the weight matrix for one unit using `draw_digit` as a guide, then use it in a loop to draw the weight matrices for each unit in the first layer of your network. Discuss what you see. Describe some of the images as patterns that could be useful for classifying particular digits. A4 EXTRA CREDIT is 0 / 2
Train a network with values for method, learning rate, number of epochs, and a
hidden layer structure with no more than 100 units in the first layer that you found work well. Extract the
weight matrix from the first layer. Now, for each unit (column in the weight matrix) ignore the first row of bias weights and
reshape the remaining weights into a 28 x 28 image for each unit and display them. Complete the following function to draw the weight matrix for one unit using draw_digit
as a guide, then use it in a loop to draw the weight matrices for each unit in the first layer of your network.
Discuss what you see. Describe some of the images as patterns that could be useful for classifying particular digits.
def draw_weight_matrix(W, unit_index = 0):
"""W is matrix of weights, with shape 784 x n_units in first layer of neural network"""
...
W = nnet.Ws[0]
n_units = W.shape[1]
n_plot_rows = round(np.sqrt(n_units) + 0.5)
n_plot_cols = n_plot_rows
plt.figure(figsize=(10, 10))
for i in range(n_units):
...
plt.tight_layout()